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1.  Introduction 


Newton's  method  for  finding  a zero  of  a function  of  one  variable 
is  the  prototype  algorithm.  There  is  a large  body  of  literature  dealing 
with  Newton's  method  for  solving  n simultaneous  equations  in  n unknowns 
(see  [17]  for  material  relating  to  this).  Since  finding  the  unconstrained 
minimizer  of  a function  of  a twice  differentiable  function  of  several  vari- 
ables involves  finding  a point  where  the  first  partial  derivatives  vanish, 
Newton's  method  is  applied  toward  solving  this  problem  also.  In  this  paper 
Newton's  method,  with  computationally  important  variations  for  the  case 
when  the  Hessian  matrix  of  the  function  is  occasionally  indefinite,  is 
analyzed . 

In  Section  2 Newton's  method  is  derived  from  a natural  point  of 
view  using  the  "gradient  path"  approach.  There  is  an  interesting  connec- 
tion between  the  classical  Cauchy  (!)  method  of  steepest  descent  and 
Newton's  method.  This  point  of  view  is  helpful  in  Section  i,  where  modi- 
fications to  the  basic  approach  are  discussed  in  order  to  obtain  convergence 
to  a second  order  point  (one  satisfying  the  requirement  that  t lie  Hessian 
matrix  lie  positive  semi-definite,  as  well  as  one  where  the  gradient  vector 
vanishes).  The  different  strategies  for  doing  this  involve  directions  of 
negative  curvature.  The  strategies  are  compared  in  a simple  example. 
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2.  A Cauchy-Newton  Approach  to  Unconstrained 
Minimization 

A natural  way  to  develop  a method  for  minimizing  an  unconstrained 
function  f(x)  is  to  consider  a physical  situation.  The  trajectory  of  a 
boulder  down  the  side  of  a mountain  (with  the  boulder  restrained  by  ropes) 
would  approximately  satisfy  the  differential  equation 

x(t)  = -Vf [ x(t ) 1 . (1) 


In  general  it  is  not  possible  to  obtain  a solution  in  closed  form 
of  (1).  When  f(x)  is  a quadratic  form  there  is  a general  solution, 
since  the  differential  equations  are  linear.  Let  x(0)  = be  the  ini- 

T 

tial  point,  and  let  EXE  be  an  eigenvector-eigenvalue  decomposition  of 

2 T 

V~f  , i.e.,  A is  the  diagonal  matrix  of  eigenvalues,  and  EE  = I with 

E the  matrix  of  eigenvectors.  Since  f is  assumed  quadratic,  this  de- 
composition is  independent  of  x . 

It  is  well  known  (see  |4|)  that  the  solution  of  (1)  is 


x ( t ) = x0  _ Ey(t)E  Vf(x())  » 


(2) 


where  y(t)  is  a diagonal  matrix  whose  jth  diagonal  element  y.(t)  is 


iG-e'VL  , lf 


Y.(t)  = 


X 4 o 
.1 


if  X.  = 0 
.1 


For  t small,  (2)  yields 


x ( t ) = x()  - Vf  ( Xq ) t . 


This  is  similar  t o the  algorithm  known  as  Cauchy's  method  of 
steepest  descent  [3|.  One  version  of  this  algorithm  is  to  generate  a se- 
quence of  minimizing  points  as 

Vi  = xk  ' v,(xk)lk  • k=0J (3) 


\ 


l 
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where  each  t^  is  obtained  from  the  following  step  size  problem  (with 


-Vf(xk))  . 


SSP  I (Step  Size  Problem  I).  At  a point  x^  , given  a direction  of 
search  , set  xk+^  = xk  + s^t^  • where  t^  is  a local  solution  to 

minimize  f(x,  + s,  t)  . 
t > 0 k k 

An  important  distinction  should  be  made  between  the  continuous  form 
of  steepest  descent  given  by  (1)  and  the  discrete  form  in  (3).  The  former 
is  suitable  for  implementation  on  an  analogue  computer.  There  are  not  many 
published  results  on  this.  One  such  experiment  (Fiacco  and  McCormick  [6], 
Section  7.3)  demonstrated  the  feasibility  of  this  approach.  Implementation 
of  this,  however,  requires  an  extraordinary  amount  of  equipment  and  time. 

The  discrete  approach  is  easily  implemented  on  a digital  computer  but  is 
notoriously  slow.  Another  important  point  is  that  the  solution  of  the  first 
order  continuous  differential  equations  seems  to  imply  that  the  appropriate 
algorithm  for  implementation  on  a digital  computer  is  not  the  discrete  form 
of  steepest  descent,  but  rather  some  form  of  Newton's  method.  With  this  in 
mind,  analysis  of  (1)  and  (2)  is  continued. 

When  t is  large  the  trajectory  x(t)  depends  upon  the  signs  and 

? 

magnitudes  of  the  eigenvalues.  In  the  case  when  V“f  is  a positive  def- 
inite matrix,  i.e.,  when  A > 0 for  all  j , 

x(">)  = x()  - (V2f)_1  Vf(x())  . 

This  is  the  algorithm  prescribed  by  the  classical  version  of  Newton's 
method  which  minimizes  a positive  definite  quadratic  form  in  one  iteration. 

The  classical  method  without  modifications  iterates  as 

Xk+1  = Xk  ~ ^2f(xk^  * Vf^xk^  ’ tor  (M 


) 


T-  343 


Near  a point  x*  where  Vf(x*)  = 0 and  V f(x*)  is  positive 
definite,  tne  rate  of  convergence  is  "at  least  quadratic,"  i.e.,  there  is 
a value  M > 0 such  that 


3 . Global  Convergence  Using  Directions  of  Negative 
Curvature 

There  are  many  theoretical  and  computational  objections  to  the  use 
of  Newton's  method.  The  computational  objections  will  not  be  taken  up  in 
this  paper;  here  the  theoretical  difficulties  are  discussed  and  the  basic 
method  modified  so  that  convergence  to  a second  order  point  can  be  obtained. 

When  x^  is  far  away  from  an  isolated  local  minimizer  the  Hessian 

matrix  may  be  indefinite,  and  occasionally  may  even  be  singular.  In  this 
instance  the  traditional  move  may  not  be  a descent  direction  (anti  in  the 
singular  case  is  not  defined).  Furthermore,  even  if  the  Hessian  is  posi- 
tive definite,  the  quadratic  approximation  may  be  so  poor  that  little 
progress  is  made.  Since  the  computation  of  the  Newton  direction  is  rela- 
tively expensive  computationally  and  since  it  is  not  known  how  close  the 
current  point  is  to  the  isolated  local  minimizer,  it  is  argued  that  a 
simpler  algorithm  is  to  be  preferred.  Another  problem  (which  may  be  more 
computational  than  theoretical)  is  that  the  condition  number  of  the  Hessian 
may  be  so  high  that  the  numerical  procedure  for  computing  the  Newton  direc- 
tion gives  a false  indication  that  the  Hessian  is  not  positive  definite. 

There  is  no  way,  using  the  traditional  Newton  equation,  that  this  possibility 
can  be  adequately  handled. 

These  objections  have  led  investigators  recently  to  propose  modifi- 
cations to  the  basic  algorithm.  Indeed,  any  computer  program  which  imple- 
ments a form  of  Newton's  method  must  take  into  account  these  difficulties. 

The  problem  of  what  form  Newton's  method  should  take  when  at  the  point  , 

, 2 

when  the  Hessian  matrix  V f ( ) is  not  positive  definite,  has  been 

investigated  by  several  people.  The  strategies  covered  hv  these  papers  tall 
into  five  general  categories. 


A.  When  in  the  course  of  computing  the  inverse  Hessian  (usually 
in  some  implicit  form)  an  indication  occurs  that  the  Hessian  is  not  pos- 
itive definite,  force  the  numerical  procedure  to  generate  a positive 
definite  matrix.  The  reasoning  behind  this  strategy  is  that  in  most 
cases  when  this  occurs,  it  is  because  of  numerical  round  off  errors  (caused 
sometimes  by  ill-conditioning)  and  that  this  will  tend  to  correct  the 
round  off  problem.  In  any  event,  it  is  argued,  the  resulting  direction 
will  be  one  of  descent. 

This  strategy  will  not  be  pursued  further  here,  except  to  note 
that  compared  to  those  discussed  below  it  is  wasteful  of  information. 

The  same  numerical  procedure  that  is  used  to  get  the  inverse  Hessian  in 
the  positive  definite  case  should  be  able  to  compute  information  that 
will  hasten  the  search  for  the  minimizer.  There  is  no  reason  why  the  des- 
cent direction  above  will  be  any  better  than,  say,  the  steepest  descent 
direction  when  the  Hessian  is  indefinite.  In  other  words,  if  the  genera- 
tion of  a direction  of  descent  is  the  only  concern,  there  are  cheaper  ways 
to  do  it.  For  more  information  on  these  techniques  the  reader  is  referred 
to  Matthews  and  Davies  [13],  Creenstadt  [10],  Levenberg  [11],  and  Marquardt 
[12]. 

B.  When  it  is  discovered  that  the  Hessian  matrix  is  not  positive 
definite,  modify  the  numerical  procedure  and  compute  d^_  , a descent  direc- 
tion of  negative  curvature,  i.e.,  a vector  such  that 

“I  V2f(*k),lk  < 0 , 

and 

dl  vf(V  < 0 • 

Set  s^  = d^  and  find  t the  step  size  scalar  by  using  SSI*  l. 

The  motivation  behind  this  strategy  is  to  hasten  the  search  for  a 
region  in  which  the  Hessian  matrix  of  f is  positive  definite  so  that  tile 
classical  approach  will  apply  and  an  ultimate  quadratic  rate  of  convergence 
be  obtained.  The  reason  this  strategy  should  accomplish  that  is  that  the 
direction  d^  is  one  in  which  the  function  decreases,  and  also  one  (at 
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least  initial ly)  in  which  the  rate  of  decrease  is  decreasing.  To  see 
this,  simply  note  that 

d£(xk+dkt)/dt  = d'  Vf(xk)  £ 0 , at  t = 0 , 

and 

d^f (xk+dkt)/dt2  = dk  V^f(xk)dk  £ 0 , at  t = 0 . 

If  the  problem  is  well  posed,  the  function  is  bounded  below  and  {x|f(x)  £ 
f(xk)}  is  a bounded  set.  Thus,  ultimately  the  curvature  of  f in  the  di- 
rection dk  emanating  from  xk  will  become  nonnegative. 

If  the  directions  { dk ) are  chosen  with  care,  eventually  the  sequence 

of  minimizing  points  will  enter  a region  in  which  the  Hessian  is  positive 
(semi)  definite  and  remain  there.  It  is  not  difficult  to  compute  a direction 
of  negative  curvature;  what  is  difficult  is  to  compute  one  which  has  some 
resemblance  to  an  eigenvector  of  the  Hessian  associated  with  its  minimum 
eigenvalue.  When  it  does,  the  minimum  eigenvalue  of  the  Hessian  can  be  ex- 
pected to  increase  each  time  the  above  strategy  is  used.  Eventually  the 
minimum  eigenvalue  is  brought  above  zero,  as  hoped  for. 

Intuitively  this  strategy  makes  sense.  Theorems  can  be  used  to  prove 
second  order  convergence  of  the  strategy  if  the  directions  Id^t  have  cer- 
tain properties.  If  one  is  willing  to  go  to  the  trouble  to  compute  o™1'1  * 

convergence  (except  under  pathological  circumstances)  can  be  established. 

The  problem  is  how  to  obtain  a good  direction  of  negative  curvature  and  use 
no  more  arithmetic  operations  than  would  he  required  to  compute  the  usual 
Newton  vector.  This  matter  has  been  discussed  elsewhere  [ 1 rt  ] . For  recent 
attempts  to  handle  this  problem  the  reader  is  referred  to  Gill  and  Murrav 
|8],  Fletcher  and  Freeman  [7],  Fiacre  and  McCormick  [ f>  ] , and  the  survey  by 
Murray  ( 1 h | . 

C.  The  most  appealing  strategy  is  based  on  (2).  If  this  were  done, 
the  continuous  steepest  descent  trajectory,  or  "gradient  path"  (based  on 
a quadratic  approximation  at  the  point  xk>,  be  approximated.  This  has 

- b - 


L 
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many  desirable  features.  For  one  thing,  the  method  is  not  as  sensitive  to 
numerical  round  off  errors  which  may  give  a false  indication  of  indefinite- 
ness. It  is  easy  to  show  that 


lim  (l-e  )/A.  = t , 


for  all  t > 0 


There  have  been  some  experiments  based  on  this  approach  (see  [18], 

[9],  [5],  and  in  particular  [2]).  The  major  drawback  is  that  it  requires 
a full  eigenvalue-eigenvector  decompositon ; that  is,  unless  techniques  for 
exponentiating  a matrix  are  used. 

An  obvious  modification  to  this  would  be  to  do  another  decomposition 
which  resembles  the  eig-eig  decomposition  using  some  numerically  stable  pro- 
cedure. There  are  many  possibilities  for  this,  but  no  published  work  seems 
to  have  been  done  in  this  area.  There  is  one  approach  which  used  a quasi- 
Newton  updating  technique  to  approximate  the  inverse  Hessian  [19],  Some 
numerical  experience  has  been  reported  there. 

The  formal  statement  of  this  approach  is:  let  x^  (t)  be  the  solution 

given  by  (2)  to  the  quadratic  approximation  problem.  Set  xj<+]  = x^(tj_) 
where  t,  is  a local  solution  to  min  f[x,  (t)]  . 

t>o  k 

D.  The  fourth  strategy  is  to  create  a trajectory  which  is  a combina- 
tion of  a descent  trajectory  and  a trajectory  given  by  a descent  direction  of 
negative  curvature.  The  desire  to  simultaneously  minimize  the  function  in 
the  directions  in  which  the  Hessian  has  positive  eigenvalues  and  to  move  also 
in  a direction  of  negative  curvature  can  take  several  forms.  Below  is  one 
form  for  which  convergence  to  a second  order  point  can  be  proved. 

Two  new  step  size  procedures  are  necessary  to  implement  this  strategy. 
They  are  generalizations  by  McCormick  [14|  of  those  suggested  by  Armijo  |1|. 


SSI’  II  At  iteration  k , given  x^  and  s a descent  direction  of 

search  emanating  from  it,  find  i (k)  , the  smallest  integer  from  i=0,l,... 
such  that 
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f(xk+Sk2  1}  ‘ f(xk}  - U 2 sk  Vf(xk}  ’ 

where  a is  a preassigned  constant  where  0 < a < i . Set 

,.-i  (k) 

k+1  Xk  k2 


SSP  III  At  iteration  k , Riven  x.  . s.  a descent  direction. 
_____  >K 

and  d^  a descent  direction  of  negative  curvature,  find  i(k)  , the  smallest 
integer  from  i=0,l,...  , such  that 

flvt(2-‘>l  - f(xk)  < « 2-h^  Vf(xk)  + A d l V2f(VV  . 

where  v (2  *)  x.  + s 2 ^ + d,  2 ^ ^ ~ . Set 
k k k k 


k+1 


x.  + S 2~i(k)  +d,2-i(k^/2  . 
k k k 


(Here  again,  a is  a preassigned  constant  with  0 < a < 1). 


Algorithm.  Let  be  a given  point.  In  general,  at  iteration  k 

there  is  available  a point  x.  . Movement  to  x,  , , takes  a different  form 

k k+1 

depending  upon  which  of  two  cases  holds. 


Case  (i):  The  Hessian  matrix  V f(x  ) is  positive  definite.  Set  s - 

K K 

-V  f(x.  ) ^ Vf(x,  ) and  obtain  x,  using  SSP  II. 
k k k+1 

f) 

Case  (ii):  The  Hessian  matrix  V ((x^)  is  not  positive  definite.  Compute 

a descent  direction  s^  and  a descent  direction  of  negative 

curvature  d.  . Use  SSP  Til  to  obtain  x,  , . 

k k + 1 

A specific  realization  of  s,  and  d,  in  Case  (ii)  would  be  the 

k k 

negative  gradient  -Vf(x  ) and  U'1  ' (with  the  sign  chosen  appropriately 
to  make  it  a descent  direction). 


- H - 
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Again,  the  difficulty  in  using  this  strategy  is  in  finding  an  effi- 
cient numerical  procedure  for  computing  d,  and  a natural  choice  for  s 

k k 

which  allows  for  minimization  of  some  "part  of"  f(x)  . 

E.  Another  strategy,  related  to  the  fourth  one,  is  to  create  an 
iteration  which  consists  of  several  steps.  Using  as  a base  point, 

move  successively  optimizing  along  computed  directions  of  negative  curva- 
ture. The  last  step  of  the  iteration  is  to  move  so  as  to  minimize  the 
"positive  part"  of  the  quadratic  approximation  at  x 

k 

2 

Specifically,  define  the  positive  part  of  the  Hessian  V f(x,)  to  be 

k 

,:k  k , k . k, 

' ■ £ “i  YV  • 

A >0 

.1 

k k k T 

where  E A (E  ) is  the  eigenvalue-eigenvector  decomposition  of  the  Hessian 


matrix.  Compute  P , a positive  semi-definite  matrix  which  is  an  approxima- 
' k k k 

tion  to  ^ , and  d ,...,d  , descent  directions  of  negative  curvature  for  it. 

1 q 


k 

Set  y^  = x^  . In  general,  for  the  jth  step  of  the  kth  iteration, 
k k 

set  s.  = d.  (where  without  loss  of  generality  it  is  assumed  that 
k t k 

(d.)  Vf(x^)  <_  0)  and  find  y.  + | by  solving  the  usual  step-size  problem: 

k k 

minimize  f[y.  + s.t)  . (5) 

t > 0 J 1 

This  is  to  be  done  for  j=l,...,q  . 

At  the  (q+l)st  step  the  direction  of  search  is  set  equal  to 

Vi  = Vf(xk)  ’ 


and  the  outcome  of  the  optimal  step  size  problem  (r>)  yields  a point  which  is 
taken  to  be  x.  , . 
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i 

i 

I 

: 

I 

I 


Partial  motivation  for  this  algorithm  can  he  given  as  follows. 
Suppose  at  x^  there  are  q negative  eigenvalues  associated  with  its 

Hessian  matrix.  Suppose  further  that  in  general  the  Hessian  matrix  of 
f at  any  general  point  x can  be  approximated  by 

V2f(x)  = EA(x)e‘  , (6) 

T 

where  EE  = I , and  A(x)  is  a diagonal  matrix.  This  is  tantamount  to 
assuming  that  although  the  eigenvalues  of  the  Hessian  matrix  of  f(x)  may 
vary  from  it  point  to  point,  the  associated  eigenvectors  do  not  change  very 
much. 


and 


For  definiteness  assume  that 

A.  (xk)  0 , for  j = l , 

A.(x  ) > 0 , for  j=q+l,...,n  . 
J K 


Set  q.  = + e.  (the  sign  chosen  so  that 
the  optimal  step-size  procedure  (3)  is  used,  it 

eT  Vf(yk  ) = 0 , for  j = 1 , 
I J+l 


(e.)T  Vf (x  ) < 0 ). 
1 k 

follows  that 

. . , q • 


Because 


(7) 


Because  of  the  approximation  above  it  follows  that 

Vf  (yk  ) = Vf  (yk)  + / EA[vk  + (yk  - vk)  s ] e"*  ds  (e  . t k ) , tor  i 1 

’ 1+1  J n J 1+1  J II 

(H) 


A simple  induction  argument  shows  that 


e ! V t ( v k ) = e ! V f ( x ) , 

I I 1 k 

If  thi'  problem  is  well-posed,  i.e.,  if 

then  the  step-size  problem  terminates  at 
necessary  conditions  imply 


for  j = l ,...,<)  . (q> 

x|l(x)  f (x  ) 1 is  a bounded  set, 
a finite  point  and  the  second  order 


> u 

VZ  f (y ( . , )e . 

1 + 1 I 


i ( v i ) 


(ID) 


ID 
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Thus  this  strategy  brings  the  eigenvalues  up  to  zero.  Assume  that  strict 
inequality  holds  in  (10).  Assume  further  that  the  eigenvalues  which  were 
strictly  positive  have  not  changed  at  all  (or,  not  very  much).  Then  at 

y^+^  , by  recomputing  the  new  information,  a usual  Newton  move  would  be 

made.  We  shall  now  show  that  no  new  computations  are  necessary,  i.e., 
that  the  step  (q+1)  accomplishes  this. 


The  Newton  direction  at  y +j  is  given  by 

■ -£"-i  YvirV!'  7f(vi>  • <”> 

Now  because  of  (7), 

< vf<Vi>  ■ 0 ■ 

For  1 £ j < q , by  virtue  of  (8)  and  the  fact  that  e!  e.  = 0 for  i/j  , 
induction  yields 

ei  Vf(yq+1)  = Vf(y)+1>  = ° 

(using  (7)  again).  Thus  (11)  above  is  equivalent  to 

-EIvi  *j  xj<Vi,'lcI  v,<Vi)  ' 

T 

The  fact  that  c.  e.  = 0 , for  i/j  , coupled  with  (8),  readily  yields 

Vf  (y^+| ) = «■!  Vf(x^)  * for  j=q+l  ,.  . . ,n  . 

This,  coupled  with  the  assumption  that  the  eigenvalues  which  wore  positive 

at  x.  do  not  change  much  (i.e.,  A . (y^  ) = A . (x, ) for  i=q+l , . . . ,n) , 

k j q+1  j k 

implies  that 


-T,"  c.  A.(yk  ) 1 e ! Vf(yk  ) T"  e.  A.(x.)  'e'  Vf(x  ) 
*-’i=q+l  i i q+i  l q+i  ^ i q + l i 1 k i k 


which  is  just  -(^k)'  Vf (x^) 
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Example.  The  nonlinear  programming  problem  is: 


minimize  sin  (x^+x2)  + (x^-x^)2  - 3xV2  + 5xV2  . 

, 1 2, 

(x  , x ) 


12  12 
The  vector  (x  , x ) is  further  restricted  so  that  x > -1.5  , x >_ 

-2.5  must  hold.  This  is  a nonconvex  programming  problem  and  has  an  in- 
finite number  of  local  minimizers.  The  plot  of  the  problem  is  in  Figure 

T 

1.  Assume  that  the  starting  point  oi  the  process  is  x = (0. , -.5) 

The  solution  of  the  differentia]  equation  (1)  starting  from  that  point  is 
given  by 


x'(t)  = [ y 1 ( t ) + y2(t)l/2  , x2(t)  = ly1 (t)  - y2(t)]/2  , 


where 


and 


y1  (t) 


= 2 arctan 


/5  r~exp(/.  75 1 - .297007)  +_ J ' 
L exp(/T75t  - .297007)  + 1. 


y2(t) 


1 . 5 exp(-2t)  . 


As  t approaches  infinity  this  trajectory  approaches  the  local  unconstrained 
minimizer  (-.5471975512,  -1.547197551)  . The  trajectory  is  plotted  (the 
dotted  curve)  in  Figure  1. 


At  x(j  , Vf(x^)*  = (-1.6224,  4.3776)  , and  V“f(x())  = 

^ . This  has  an  eigenvalue  (-.95885)  in  the  direction 

T T 

1(1,1)  , .and  an  eigenvalue  (4.)  in  the  direction  '(1,-1)  . For  simplicity 

the  normalization  of  the  eigenvectors  will  be  incorporated  into  the  diagonal 

matrix.  The  trajectory  given  by  the  quadratic  approximation  method  (Strategy 

C)  is 

/ 0.\  / 1 1\  /(1-exp  [ -At  11/8  0 \/l  - 1 \ /-  1 . 6 2 24  \ 

\.  5 / \-l  I / \ 0 - ( 1 -exp  [ . 95885t  ] ) / I . 91  7 7/ \ 1 1/\  4.  1376/ 

This  trajectory  is  plotted  in  Figure  1.  The  point  to  notice  is  how 
closely  this  follows  the  solution  ol  the  differential  equation. 


/ 1.520574,  -2.479426 
\-2. 479426,  1.520574 


Ini t i a] 1 v t in 


3 


-Vf [x(t) | , x(0)  = xQ 

-VQQ(x(t)]  where  Qq(x)  is  quadratic 
Armijo  (second  order)  points 
Minimizer  of  f along 

Solution  of  second  order  Armijo  step-size  procedure 

_ , min 

Minimizer  of  f along  en 


S 1 N ( x j +x  7 ) + (x(-x,)~  - 

ix 

l'2  + 

5x,,/2 

S.T.  -2.  r)  x , 

figure  1.  — Results 

of 

d it  1 1 

erent 

strategies  (or  a non 

convex 

uni 

oust  r 

a i nod 

min  imiz.it  ion  proh)  em 
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tangent  to  the  trajectory  is  the  steepest  descent  direction,  and  later 

T 

it  turns  to  point  in  the  direction  (-1,-1)  . It  passes  very  close  to  the 

unconstrained  minimizer.  The  solution  of  the  optimal  step-size  problem  is 

T 

at  t^  = .67  with  Xg(tg)  = (-.67,  -1.50)  . The  objective  function  value 

there  is  flx^t^)]  = -2.85  . This  is  given  as  point  A in  Figure  1. 

Use  of  Strategy  B on  this  example  involves  minimizing  f along  the 
T T 

ray  (-1,-1)  starting  from  (0...5)  . The  result  of  this  is  point  C in 

T 

Figure  1,  approximately  (-1.2972,  -.7972)  . The  region  of  positive  def- 
initeness has  been  found  and  the  regular  Case  (i)  version  of  Newton's 
method  would  apply  now. 

The  results  of  applying  Strategy  1)  are  summarized  in  Table  1 and  are 
also  given  in  Figure  1.  A value  of  a = 1/2  was  used  in  the  Armijo  proce- 
dure. For  i=0  the  point  generated  was  outside  the  given  bounds.  The  thresh- 
old criterion  failed  for  i=l  but  passed  for  i=2  . The  terminal  point  is 

labelled  B in  the  figure  and  is  approximately  (.0520,  -1.43795)*  . 

Applying  the  classical  version  of  Newton's  method  (4)  at  this  point 

yields 

/ .0520  \ / 2.982964,  -I . 0 1 703 5 V" V l . 66 369 5 5\  _ . 540799242\ 

2 \-l. 43795/  \-l. 017035,  2.982964/  \-. 2961045/  \- 1 . 54079799/ 

which  is  very  close  to  the  unconstrained  minimizer. 

The  result  using  the  general  Strategy  K is  as  follows.  The  positive 
portion  of  the  Hessian  matrix  is 


The  move  from  x^  along  the  direction  of  negative'  curvature  is  that  pre- 
scribed by  Strategy  B and  can  be  used  as  a starting  point.  'Ihis  is  point  C 
in  Figure  1. 

- 14  - 


TABLE  1 


L 
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The  direction  vector  is  then 


-<P°)+  Vf(xQ) 


/ r-'ti  U/^’  1/A)  / v / v 

I l/i/2  j /-].6224\  _ / . 7 5\ 

V-1//2/  \ 4.3776/  y- . 75/ 


With  a step  size  of  one,  this  yields 


C-.  5472\  /-1.2972\  / .7rA 

-1.5472/  ~ \ 7972/  + \-.75/ 


In  this  example  this  is  the  desired  unconstrained  minimizer  and  no  further 
computations  are  necessary. 


4.  Summary 

In  this  paper  several  strategies  have  been  presented  for  modifying 
the  classical  form  of  Newton's  method  when  the  Hessian  matrix  is  not  posi- 
tive definite  at  some  iterate.  The  emphasis  was  on  the  geometric  motiva- 
tion for  the  methods,  rather  than  on  convergence  theorems  which  can  be 
proved  for  specific  algebraic  implementations  of  the  methods.  In  the  papers 
referenced  there  is  a general  concensus  that  although  the  natural  way  to 
look  at  the  methods  is  from  the  point  of  view  of  the  eigenvector-eigenvalue 
decomposition  of  the  Hessian  matrix,  it  is  also  agreed  that  computationally 
this  is  too  expensive.  Most  of  the  authors  quoted  intend  to  work  on  imita- 
tive algorithms  which  do  not  require  this  (and  in  some  cases  do  not  require 
explicit  computation  of  the  second  derivatives). 

The  computational  problems  associated  with  these  methods  have  not 
been  elaborated  upon  here,  nor  have  the  difficulties  in  getting  good  test 
problems.  Since  the  modifications  to  be  made  occur  infrequently  compared 
to  the  usual  Newton  or  quasi-Newton  steps,  care  in  generating  test  situations 
to  compare  these  different  strategies  must  be  taken. 


- lb  - 
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